{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "530c973e-916d-4c9e-9365-e2d5306d7e3d",
   "metadata": {
    "tags": []
   },
   "source": [
    "# OpenAI Pydantic Program"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18461ba1-6978-4b5b-861e-6dceec36857b",
   "metadata": {
    "tags": []
   },
   "source": [
    "This guide shows you how to generate structured data with [new OpenAI API](https://openai.com/blog/function-calling-and-other-api-updates) via LlamaIndex. The user just needs to specify a Pydantic object.\n",
    "\n",
    "We demonstrate two settings:\n",
    "- Extraction into an `Album` object (which can contain a list of Song objects)\n",
    "- Extraction into a `DirectoryTree` object (which can contain recursive Node objects)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fcefc79-68b4-4481-b1ef-a902fc12e4c8",
   "metadata": {},
   "source": [
    "## Extraction into `Album`\n",
    "\n",
    "This is a simple example of parsing an output into an `Album` schema, which can contain multiple songs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f7a83b49-5c34-45d5-8cf4-62f348fb1299",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from pydantic import BaseModel\n",
    "from typing import List\n",
    "\n",
    "from llama_index.program import OpenAIPydanticProgram"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0563f1ba-8086-4dcc-ba35-bfda31c45ae4",
   "metadata": {},
   "source": [
    "Define output schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "42053ea8-2580-4639-9dcf-566e8427c44e",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "class Song(BaseModel):\n",
    "    \"\"\"Data model for a song.\"\"\"\n",
    "\n",
    "    title: str\n",
    "    length_seconds: int\n",
    "\n",
    "\n",
    "class Album(BaseModel):\n",
    "    \"\"\"Data model for an album.\"\"\"\n",
    "\n",
    "    name: str\n",
    "    artist: str\n",
    "    songs: List[Song]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4afff44e-a746-4b9f-85a9-72058bcdd29f",
   "metadata": {},
   "source": [
    "Define openai pydantic program"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "fe756697-c299-4f9a-a108-944b6693f824",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "prompt_template_str = \"\"\"\\\n",
    "Generate an example album, with an artist and a list of songs. \\\n",
    "Using the movie {movie_name} as inspiration.\\\n",
    "\"\"\"\n",
    "program = OpenAIPydanticProgram.from_defaults(\n",
    "    output_cls=Album,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7be01dc-433e-4485-bab0-36a04c3afbcb",
   "metadata": {},
   "source": [
    "Run program to get structured output.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "25d02228-2907-4810-932e-83ec9fc71f6b",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Function call: Album with args: {\n",
      "  \"name\": \"The Shining\",\n",
      "  \"artist\": \"Various Artists\",\n",
      "  \"songs\": [\n",
      "    {\n",
      "      \"title\": \"Main Title\",\n",
      "      \"length_seconds\": 180\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Opening Credits\",\n",
      "      \"length_seconds\": 120\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"The Overlook Hotel\",\n",
      "      \"length_seconds\": 240\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Redrum\",\n",
      "      \"length_seconds\": 150\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Here's Johnny\",\n",
      "      \"length_seconds\": 200\n",
      "    }\n",
      "  ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "output = program(movie_name=\"The Shining\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27ec0777-28d5-494b-b419-daf6bce2b20e",
   "metadata": {
    "tags": []
   },
   "source": [
    "The output is a valid Pydantic object that we can then use to call functions/APIs. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "3e51bcf4-e7df-47b9-b380-8e5b900a31e1",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Album(name='The Shining', artist='Various Artists', songs=[Song(title='Main Title', length_seconds=180), Song(title='Opening Credits', length_seconds=120), Song(title='The Overlook Hotel', length_seconds=240), Song(title='Redrum', length_seconds=150), Song(title=\"Here's Johnny\", length_seconds=200)])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58eb69c1-bafe-49e4-9d14-f676662e74ac",
   "metadata": {},
   "source": [
    "## Extraction into `Album` (Streaming)\n",
    "\n",
    "We also support streaming a list of objects through our `stream_list` function.\n",
    "\n",
    "Full credits to this idea go to `openai_function_call` repo: https://github.com/jxnl/openai_function_call/tree/main/examples/streaming_multitask"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0a155e9c-a7e0-46b8-addb-7fbf844c0107",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"name\": \"The Journey\",\n",
      "  \"artist\": \"Unknown\",\n",
      "  \"songs\": [\n",
      "    {\n",
      "      \"title\": \"Lost in the Woods\",\n",
      "      \"length_seconds\": 240\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Endless Horizon\",\n",
      "      \"length_seconds\": 320\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Mystic Dreams\",\n",
      "      \"length_seconds\": 280\n",
      "    }\n",
      "  ]\n",
      "}\n",
      "{\n",
      "  \"name\": \"Electric Pulse\",\n",
      "  \"artist\": \"Synthwave Master\",\n",
      "  \"songs\": [\n",
      "    {\n",
      "      \"title\": \"Neon Nights\",\n",
      "      \"length_seconds\": 300\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Cyber City\",\n",
      "      \"length_seconds\": 280\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Digital Dreams\",\n",
      "      \"length_seconds\": 320\n",
      "    }\n",
      "  ]\n",
      "}\n",
      "{\n",
      "  \"name\": \"Soulful Serenade\",\n",
      "  \"artist\": \"Smooth Jazz Trio\",\n",
      "  \"songs\": [\n",
      "    {\n",
      "      \"title\": \"Midnight Groove\",\n",
      "      \"length_seconds\": 280\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Saxophone Serenade\",\n",
      "      \"length_seconds\": 320\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Chill Vibes\",\n",
      "      \"length_seconds\": 240\n",
      "    }\n",
      "  ]\n",
      "}\n",
      "{\n",
      "  \"name\": \"Rock Revolution\",\n",
      "  \"artist\": \"The Thunderbolts\",\n",
      "  \"songs\": [\n",
      "    {\n",
      "      \"title\": \"High Voltage\",\n",
      "      \"length_seconds\": 320\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Guitar Shredder\",\n",
      "      \"length_seconds\": 280\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Rock Anthem\",\n",
      "      \"length_seconds\": 300\n",
      "    }\n",
      "  ]\n",
      "}\n",
      "{\n",
      "  \"name\": \"Pop Sensation\",\n",
      "  \"artist\": \"The Starlets\",\n",
      "  \"songs\": [\n",
      "    {\n",
      "      \"title\": \"Catchy Melody\",\n",
      "      \"length_seconds\": 240\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Dancefloor Hit\",\n",
      "      \"length_seconds\": 300\n",
      "    },\n",
      "    {\n",
      "      \"title\": \"Pop Princess\",\n",
      "      \"length_seconds\": 280\n",
      "    }\n",
      "  ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "prompt_template_str = \"{input_str}\"\n",
    "program = OpenAIPydanticProgram.from_defaults(\n",
    "    output_cls=Album,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=False,\n",
    ")\n",
    "\n",
    "output = program.stream_list(input_str=\"make up 5 random albums\")\n",
    "for obj in output:\n",
    "    print(obj.json(indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d6159d2-c967-4523-ad3c-82bc17009b2c",
   "metadata": {},
   "source": [
    "## Extraction into `DirectoryTree` object\n",
    "\n",
    "This is directly inspired by jxnl's awesome repo here: https://github.com/jxnl/openai_function_call.\n",
    "\n",
    "That repository shows how you can use OpenAI's function API to parse recursive Pydantic objects. The main requirement is that you want to \"wrap\" a recursive Pydantic object with a non-recursive one.\n",
    "\n",
    "Here we show an example in a \"directory\" setting, where a `DirectoryTree` object wraps recursive `Node` objects, to parse a file structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "b58f6a12-3f5c-414b-80df-4492f6e18be5",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# NOTE: defining recursive objects in a notebook causes errors\n",
    "from directory import DirectoryTree, Node"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "bc1c7eeb-f0a3-4d72-86ee-c6b63e01b0ff",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'title': 'DirectoryTree',\n",
       " 'description': 'Container class representing a directory tree.\\n\\nArgs:\\n    root (Node): The root node of the tree.',\n",
       " 'type': 'object',\n",
       " 'properties': {'root': {'title': 'Root',\n",
       "   'description': 'Root folder of the directory tree',\n",
       "   'allOf': [{'$ref': '#/definitions/Node'}]}},\n",
       " 'required': ['root'],\n",
       " 'definitions': {'NodeType': {'title': 'NodeType',\n",
       "   'description': 'Enumeration representing the types of nodes in a filesystem.',\n",
       "   'enum': ['file', 'folder'],\n",
       "   'type': 'string'},\n",
       "  'Node': {'title': 'Node',\n",
       "   'description': 'Class representing a single node in a filesystem. Can be either a file or a folder.\\nNote that a file cannot have children, but a folder can.\\n\\nArgs:\\n    name (str): The name of the node.\\n    children (List[Node]): The list of child nodes (if any).\\n    node_type (NodeType): The type of the node, either a file or a folder.',\n",
       "   'type': 'object',\n",
       "   'properties': {'name': {'title': 'Name',\n",
       "     'description': 'Name of the folder',\n",
       "     'type': 'string'},\n",
       "    'children': {'title': 'Children',\n",
       "     'description': 'List of children nodes, only applicable for folders, files cannot have children',\n",
       "     'type': 'array',\n",
       "     'items': {'$ref': '#/definitions/Node'}},\n",
       "    'node_type': {'description': 'Either a file or folder, use the name to determine which it could be',\n",
       "     'default': 'file',\n",
       "     'allOf': [{'$ref': '#/definitions/NodeType'}]}},\n",
       "   'required': ['name']}}}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DirectoryTree.schema()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "02c4a7a1-f145-40bc-83b8-4153a531a8eb",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "program = OpenAIPydanticProgram.from_defaults(\n",
    "    output_cls=DirectoryTree,\n",
    "    prompt_template_str=\"{input_str}\",\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "c88cf49f-a52f-4415-bddc-14d70c897629",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Function call: DirectoryTree with args: {\n",
      "  \"root\": {\n",
      "    \"name\": \"root\",\n",
      "    \"children\": [\n",
      "      {\n",
      "        \"name\": \"folder1\",\n",
      "        \"children\": [\n",
      "          {\n",
      "            \"name\": \"file1.txt\",\n",
      "            \"children\": [],\n",
      "            \"node_type\": \"file\"\n",
      "          },\n",
      "          {\n",
      "            \"name\": \"file2.txt\",\n",
      "            \"children\": [],\n",
      "            \"node_type\": \"file\"\n",
      "          }\n",
      "        ],\n",
      "        \"node_type\": \"folder\"\n",
      "      },\n",
      "      {\n",
      "        \"name\": \"folder2\",\n",
      "        \"children\": [\n",
      "          {\n",
      "            \"name\": \"file3.txt\",\n",
      "            \"children\": [],\n",
      "            \"node_type\": \"file\"\n",
      "          },\n",
      "          {\n",
      "            \"name\": \"subfolder1\",\n",
      "            \"children\": [\n",
      "              {\n",
      "                \"name\": \"file4.txt\",\n",
      "                \"children\": [],\n",
      "                \"node_type\": \"file\"\n",
      "              }\n",
      "            ],\n",
      "            \"node_type\": \"folder\"\n",
      "          }\n",
      "        ],\n",
      "        \"node_type\": \"folder\"\n",
      "      }\n",
      "    ],\n",
      "    \"node_type\": \"folder\"\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "input_str = \"\"\"\n",
    "root\n",
    "├── folder1\n",
    "│   ├── file1.txt\n",
    "│   └── file2.txt\n",
    "└── folder2\n",
    "    ├── file3.txt\n",
    "    └── subfolder1\n",
    "        └── file4.txt\n",
    "\"\"\"\n",
    "\n",
    "output = program(input_str=input_str)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0bf23c34-2cc7-4d0f-9389-75f7cf6bf9a2",
   "metadata": {},
   "source": [
    "The output is a full DirectoryTree structure with recursive `Node` objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3885032f-0f3a-4afb-9157-54851e810843",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DirectoryTree(root=Node(name='root', children=[Node(name='folder1', children=[Node(name='file1.txt', children=[], node_type=<NodeType.FILE: 'file'>), Node(name='file2.txt', children=[], node_type=<NodeType.FILE: 'file'>)], node_type=<NodeType.FOLDER: 'folder'>), Node(name='folder2', children=[Node(name='file3.txt', children=[], node_type=<NodeType.FILE: 'file'>), Node(name='subfolder1', children=[Node(name='file4.txt', children=[], node_type=<NodeType.FILE: 'file'>)], node_type=<NodeType.FOLDER: 'folder'>)], node_type=<NodeType.FOLDER: 'folder'>)], node_type=<NodeType.FOLDER: 'folder'>))"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75e712e2-5f84-4899-a000-54a7a53fc72c",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index_v2",
   "language": "python",
   "name": "llama_index_v2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
